Applying Authorship Analysis to Arabic Web Content
نویسندگان
چکیده
The advent and rapid proliferation of internet communication has allowed the realization of numerous security issues. The anonymous nature of online mediums such as email, web sites, and forums provides an attractive communication method for criminal activity. Increased globalization and the boundless nature of the internet have further amplified these concerns due to the addition of a multilingual dimension. The world’s social and political climate has caused Arabic to draw a great deal of attention. In this study we apply authorship identification techniques to Arabic web forum messages. Our research uses lexical, syntactic, structural, and content-specific writing style features for authorship identification. We address some of the problematic characteristics of Arabic in route to the development of an Arabic language model that provides a respectable level of classification accuracy for authorship discrimination. We also run experiments to evaluate the effectiveness of different feature types and classification techniques on our dataset.
منابع مشابه
Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملCollusion Set Detection Through Outlier Discovery
Digging in the details : a case study in network data mining p. 14 Efficient identification of overlapping communities p. 27 Event-driven document selection for terrorism information extraction p. 37 Link analysis tools for intelligence and counterterrorism p. 49 Mining candidate viruses as potential bio-terrorism weapons from biomedical literature p. 60 Private mining of association rules p. 7...
متن کاملCo-authorship network analysis and social network indicators of coronavirus research
Background and aim: The aim of this study was to examine the status of documents related to coronavirus based on scientometric indicators and to draw a co-authorship map of authors, organizations and countries producing an article to get to know this field as much as possible. Materials and methods: This applied-scientometric was conducted using social network analysis. The statistical populati...
متن کاملCo-authorship networks in the digital library research community
The field of digital libraries (DLs) coalesced in 1994: the first digital library conferences were held that year, awareness of the World Wide Web was accelerating, and the National Science Foundation awarded $24 Million (U.S.) for the Digital Library Initiative (DLI). In this paper we examine the state of the DL domain after a decade of activity by applying social network analysis to the co-au...
متن کاملVisualization of scientific co-authorship in Spanish universities: From regionalization to internationalization
Purpose – To visualize the inter-university and international collaboration networks generated by Spanish universities based on the co-authorship of scientific articles. Design/methodology/approach Formulation based on a bibliometric analysis of Spanish university production from 2000 to 2004 as contained in Web of Science databases, applying social network visualization techniques. The co-auth...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005